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Abstract. We study a model of associative memory based on a neural network with small-world structure. 
The efficacy of the network to retrieve one of the stored patterns exhibits a phase transition at a finite value 
of the disorder. The more ordered networks are unable to recover the patterns, and are always attracted 
to mixture states. Besides, for a range of the number of stored patterns, the efficacy has a maximum at 
an intermediate value of the disorder. We also give a statistical characterization of the attractors for all 
values of the disorder of the network. 

PACS. 84.35.+i Neural networks - 89. 75. He Networks and genealogical trees - 87.18.Sn Neural networks 



1 Small-world neural networks 

Artificial neural networks have been used as a model for 
associative memory since the 80's, and a considerable a- 
mount of work has been made in the field . Most of 
this work regards both the simulation and the theory of 
completely connected networks, as well as networks with a 
random dilution of the connectivity. It is known that par- 
ticular prescriptions for the determination of the synap- 
tic weights enable these systems to successfully retrieve a 
pattern out of a set of memorized ones. This behavior is 
observed in the system up to a certain value of the num- 
ber of stored patterns, beyond which the network becomes 
unable to retrieve any of them. For reasons of simplicity 
of the models and their analytical tractability, complex 
architectures of the networks, more akin to those found in 
biological neural systems, have been largely left out of the 
theoretical analysis. Fortunately, since a few years ago, a 
class of models that has come to be known as "complex 
networks" began to be thoroughly studied. Complex net- 
works seem more compatible with the geometrical proper- 
ties of many biological and social phenomena than regular 
lattices, random networks, or completely connected sys- 
tems HIKKj . Already in the seminal work of Watts and 
Strogatz .3: 7 whose small- world model combines proper- 
ties of regular and random networks, it was observed that 
the neural system of the nematode C. elegans shares topo- 
logical properties with this model networks. 

In this paper we study a neural network built upon the 
Watts-Strogatz model for small worlds. The model inter- 
polates between regular and random networks by means of 
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a parameter p, which characterizes the disorder of the net- 
work. The construction, as formulated in Ref. [3], begins 
with a one-dimensional regular lattice of N nodes, each 
one linked to its K nearest neighbors to the right and 
to the left, and with periodic boundary conditions. With 
probability p, each one of the right-pointing links, of every 
node, is rewired to a randomly chosen node in the net- 
work. Self connections and repeated connections are not 
allowed. The result is a disordered network, defined by 
the set TV, K, p, that lies between a regular lattice (p = 0) 
and a random graph (p = 1). A wide range of these net- 
works displays high local clusterization and short average 
distance between nodes, as many real complex networks. 
They can be defined by the connectivity matrix Cij , where 
dj = 1 if there is a link between nodes i and j, and cy = 
otherwise. We use this matrix to establish the synaptic 
connections between neurons, at variance from the tradi- 
tional Hopfield model, where the network is completely 
connected and the connectivity matrix is Cjj = 1, Vi, j. At 
p = 1 it coincides with the standard diluted disordered 
networks, that have also been considered in the literature, 
in which randomly chosen elements in the connectivity 
matrix are set to zero. 

The biological neuron carries out an operation on the 
inputs provided by other neurons, and it produces an out- 
put. A transformation of this continuous output into a 
binary variable makes it possible to formulate a simplified 
model in which the neurons are logical elements (Ref. p^, 
chapter 2). In this binary representation, the state of each 
neuron is characterized by a single variable s^. This vari- 
able can take two values representing the active and the 
inactive states, 

{1 if the neuron is active, , , 

— 1 if the neuron is inactive. ^ ' 
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The purpose of an associative memory model, is to 
retrieve some patterns that have been stored in the net- 
work by an unspecified learning process. The stored — or 
memorized — patterns are represented by network states 

where ijl = 1, . . . , M labels the different patterns and 
M is their number. As usual, the patterns are generated 
at random, assigning with equal probability 1/2 the values 
= ±1. The patterns are uncorrelated and thus orthog- 
onal in large networks: 

1 N 

^£^r = <w (2) 

i=l 

The state of the neurons is updated asynchronously, as 
in Glauber dynamics. At each simulation step, a neuron 
is chosen at random, and its new state is determined by 
the local field: 

N 

( 3 ) 

i=i 

according to: 

Si = siga(hi). (4) 

The synaptic weights wy of the connections are given 
by Hebb's rule, restricted to the synapsis actually present 
in the network, as given by the connectivity matrix: 

1 M 

w v y X! ''■j^^'I ( 5 ) 

for i,j — 1, . . . , N. Note that as the network model does 
not allow self connections the diagonal matrix elements 
are null. By definition, the synaptic matrix is symmetric. 

The dynamics prescribed by Eqs. © and (TJJ is deter- 
ministic, and the network is not subject to thermal fluctu- 
ations. We will only consider the effects of a small amount 
of additive noise to verify the robustness of our results. A 
full discussion of the effect of a finite temperature in the 
dynamics will be left for future work. The stochastic asyn- 
chronous update, though, prevents the system from hav- 
ing limit cycles, and the only attractors are fixed points. 
The stored patterns £ M are, by construction of the synap- 
tic weights |0| j fixed points of the dynamics due to the 
orthogonality condition. In the model, "memory" is the 
capacity of the network to retrieve one of the stored pat- 
terns from an arbitrary initial condition. As in traditional 
models, the reversed patterns {—&), as well as a wealth of 
symmetric and asymmetric mixtures of patterns, are also 
equilibria of the system and play a significant role in its 
behavior as a memory device. 



2 Effect of the disordered topology 

We have performed extensive numerical simulations of the 
system, starting from a random unbiased initial condition. 
After a transient, a fixed point is reached, whence no fur- 
ther changes occur to any neuron. In order to measure the 
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Fig. 1. Efficacy to retrieve a memorized pattern, ip, as a func- 
tion of the disorder p. The curves correspond to different num- 
ber of stored patterns: (squares) M = 1, (circies) M — 2, 
(up triangles) M = 5, (down triangles) M = 10, (diamonds) 
M = 20. Inset: The efficacy as a function of the number of 
stored patterns, at p = 1. Simulation parameters: N — 5x 10 3 , 
K = 100, 10 4 realizations per point. 



efficacy of the network to recall a number M of stored ran- 
dom patterns, we define an efficacy ip as the fraction of re- 
alizations in which one of the stored patterns is retrieved. 
In Fig.r^we plot the order parameter tp as a function of the 
disorder parameter p. The different curves correspond to 
different numbers of stored patterns, M = 1, 2, 5, 10, and 
20. For this plot we have used N = 5 x 10 3 and K = 100. 
Averages have been taken over 10 4 realizations. For each 
realization we use different patterns, as well as different 
initial conditions. Figure ri] shows that on highly ordered 
networks the system does not retrieve any stored pattern. 
Then there is a transition as the disorder parameter p 
grows, and above some critical value of p, patterns are 
retrieved as fixed points yielding <p > 0. For M = 1 and 
M = 2,<p=l above p ps 0.4. But for M > 2 we find that 
ip does not grow monotonically with p. Instead, it decays 
as p grows after reaching a maximum value. This surpris- 
ing non monotonic behavior with the disorder parameter 
p has been observed before in a problem of biased diffu- 
sion [7], and in an Ising model J3], both with asymmetric 
interactions. 

In the inset of Fig.H we plot ip vs. M for a disordered 
network with p — 1. As the number of stored patterns 
M grows, the network is not able to retrieve them. The 
curve also shows a non monotonic behavior with M. The 
transition as the number of stored patterns grows has al- 
ready been studied in diluted disordered networks (Ref. 
PQ, chapter 7). It is known that random dilution reduces 
capacity of a neural network in a way which is proportional 
to the fraction of available connections. For our system 
(which is very diluted) the transition, then, takes place 
at M c w 0.15{K/N)N = 15, as observed. Nevertheless, 
we are mostly interested in the behavior of the system re- 
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Fig. 2. Efficacy <p as a function of the disorder parameter p, 
for systems of different sizes (as shown in the legend), and 
K — 100. The number of stored patterns is M = 5, with 10 4 
realizations per point. Inset: The same curves, scaled with the 
system size according to Eq. ©, collapse to a single curve <P, 
with p c — 0.333 and a = 0.2. 



garding the different topologies characterized by p. The 
fact that the transition between the remembering and the 
non-remembering phases occurs at a finite value of the 
disorder parameter is very interesting, since a few dynam- 
ical systems based on small-world architectures show it 
|9"lll0llll| . This occurs in spite of the fact that the average 
distance between nodes, the main geometrical property of 
the Watts-Strogatz model, has a transition at p = [T2] . 
Indeed, for several Ising-like systems, which bear some 
similarities with artificial neural networks, a phase transi- 
tion occurs at p = |12II13II14II15| . 

In order to understand the finite size effects in the 
system, and the behavior of the transition in the limit of 
an infinite system, we have made simulations on systems 
of different sizes. We have chosen to keep the connectivity 
parameter of the model constant through all the results we 
show, K = 100. In this regard, our results correspond to a 
neural network characterized by certain properties at the 
local level, for example the average connectivity of each 
neuron (2K in our systems). Our finite size analysis shows 
the behavior of these networks in systems of increasing size 
iV and in the limit N — > oo. 

The plot of tp vs. p for different values of N is shown in 
Fig- HI For this curves we have set = 100 and M = 5, 
averaging over 10 4 independent realizations. As seen in 
the figure, all the curves seem to cross for the same value 
of the disorder parameter p = p c ~ 0.333. 

Based on numerical evidence, we find that the depen- 
dence of the efficacy on the system size can be built into 
a scaling function: 



cp(p,N) = <P{(p-p c )Nc 



(6) 



At the point of crossing of the curves, ip becomes indepen- 
dent of N. 



Since the order parameter is not singular at the transi- 
tion, we can expand <P as a Taylor series around the critical 
control parameter p c : 



<p(p,N)=$(0)+&(0) (p-p c )N° 

to first order in (p — p c ). Defining (p = tp - 
p = p — p c we can write: 



dip 
dp 



(N) 



= <2>'(0) N a . 



(7) 
and 

(8) 



p=0 



Plotting on a log-log scale the derivative d<p/dp\o vs. 
N, we obtain the exponent a as the slope of the line. Using 
data from N — 2x 10 3 to N = 10 5 , we find a = 0.23±0.04, 
and <P'(Q) = 0.096±0.016. In the inset of Fig-H we plot the 
re-scaled curves for different N. The best data collapse is 
obtained with a = 0.2, compatible with the above result. 
Observe that the data corresponding to N = 10 3 (squares) 
fail to match the scaling curve, indicating a lower bound 
of what can be considered a "large" system for this model. 

Except in the relatively narrow range of p where (p r* 1 , 
the system fails to retrieve any stored pattern in a signif- 
icant fraction of the realizations: almost always when the 
network is very ordered (down to p — 0), and about 12% 
of the times when the network is very disordered (up to 
p = 1). What happens in the phase space as the network 
architecture changes? What happens to the trajectories, 
and why are the patterns missed? It seems natural to ex- 
pect that the energy landscape is different for p = than 
for p = 1. To address this problem we turn our attention 
to the properties of the overlaps of the equilibrium state 
with the memorized patterns. Suppose that after a tran- 
sient the network has reached a fixed point £. We define 
the overlap of this fixed point with the patterns as 



r = — 

N 



N 



(9) 



Note that if the fixed point is a stored pattern, £ = 
then 9 V = 1. In order to determine the type of fixed points 
that are reached when the network misses the patterns, we 
measure the overlap M of the fixed point with the stored 
patterns The probability distribution P(9) of these 
overlaps gives information on the kind of mixture that 
the fixed point is. Figure |3] shows the overlap distribu- 
tions for several levels of disorder in the network. In this 
plots, N = 2 x 10 3 , K = 100, M = 5 and 10 6 realiza- 
tions are used per curve. For the three higher values of p, 
the distributions have a high peak at 9 = 1, which is not 
shown for reasons of scale. This peak corresponds to the 
realizations that end up in a pattern, which happens fre- 
quently whenever p > p c , as seen in Fig- EI The somewhat 
broader peak that these distributions have at low values 
of 9 has the same origin, since the overlaps with the other 
M — 1 patterns have a low value whenever a pattern is 
reached. Indeed, the overlap of two uncorrelated states 
has a mean value 9 = 0.022. In the intermediate range of 
9, the distribution presents a broad bump around 9 = 1/2. 
This corresponds to symmetric mixtures of the patterns, 
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although the width of this bump suggests that asymmet- 
ric mixtures are present as well. In particular, the smaller 
peak present around 9 ks 0.35 for the completely random 
network, corresponds to asymmetric mixtures. In contrast 
with these three cases — at and above the critical point — 
for ordered networks with p = the overlap distribution is 
broad and does not have peak at 9 = 1 . It has a maximum 
at 9 = and decays as 9 grows, but large overlaps are 
observed in some realizations as the distribution shows. 
This is the only curve for which the complete distribution 
is shown. As the distribution suggests, the fixed points of 
these systems consist of very asymmetric mixtures. 

The previous analysis unveiled the structure of the 
phase space and the difference between the low and the 
high p regimes. Still, what is the reason for the catas- 
trophic loss of memory below the critical value of dis- 
order? We have found that, for low values of disorder, 
the fixed points retrieve scattered pieces of several stored 
patterns. These fixed points consist of localized regions 
that overlap with different patterns. Indeed, at p = 0, the 
network is topologically very clustcrizcd, and there exist 
local neighborhoods relatively isolated from each other. 
These neighborhoods begin to disappear by the action of 
the shortcuts provided by the random rewiring at higher 
values of p, until the whole system becomes essentially a 
single neighborhood. Then, at p = 0, from an arbitrary ini- 
tial condition, different regions of the network eventually 
align themselves with different patterns. The final result 
is a completely asymmetric mixture, impossible to classify 
due to the arbitrariness of its origin and nature. These 
are the states that the broad distribution of overlaps de- 
scribes, in Fig. |21 for p = 0. The existence of asymmetric 
mixtures as attractors in this kind of associative mem- 
ory model have been observed before (see for example pQ, 
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Fig. 3. Distribution of overlaps P(8) after a fixed point has 
been achieved, between the state of the system and all stored 
patterns. Each curve corresponds to a value of p, as shown in 
the legend, typical of the different memory behaviors observed. 
A large peak at 9 = 1 (perfect retrieval of a pattern) is not 
shown for reasons of scale (see discussion in the text). 



chapter 4). But since they are very rare in the completely 
random or in the completely connected networks, they are 
very difficult to observe. In the present context, however, 
they play an essential role in the destruction of the ability 
of the system to retrieve the patterns. 

In order to quantify this, we proceed to define a corre- 
lation measure that provides a clear picture of the situa- 
tion. We introduce the difference of the fixed point £ with 
a given pattern: 



1 if 
-1 if 



£f — Cij 



(10) 



Then we define a local magnetization for the difference 
vector cP, for every node i; 



1 



(11) 



where Vi is the set of neighbors of node i. The local mag- 
netization mf measures the local alignment with the \x 
pattern or its reversed companion. The maximum value 
= 1 arises when = d% Vj G V;. The presence of 
connected domains where the fixed point £ overlaps with 
the £ M pattern should be detected as short range corre- 
lations between the local magnetizations. The correlation 
between the local magnetizations of the difference vector 
with the fj, pattern are then defined as: 



1 N 

-Y 



1 



(12) 



As we intend to capture the existence of correlations in 
the difference with patterns that appear in the mixture 
that makes up the fixed point we define the maximum 
correlation 

C* = max{C' 1 }. (13) 

Figure0]presents the probability distribution P(C) for 
different levels of network disorder. Each distribution is 
constructed over 10 6 realizations of N — 2 x 10 3 networks, 
with connectivity K = 100. For p = 0we observe a broad 
peak centered around C ~ 0.3. This is a quantitative mea- 
sure of the occurrence of correlations on ordered networks, 
as we pointed out. For the other values of p considered in 
the figure, the distribution has a sharp peak at C = 1 
which we have not shown for reasons of scale, correspond- 
ing to the fixed points that coincide with a pattern, and 
consequently give the highest possible value of the correla- 
tion. Besides this peak, the most disordered systems show 
a narrow peak at C w 0.25, and the curve for p = 1 also 
a smaller one at C « 0.15. These two peaks correspond 
to symmetric and asymmetric mixtures, respectively. For 
p = 0.333, very close to the critical point, the distribu- 
tion presents a very small bump at C w 0.3. It is easy 
to see, form the extended region of P(C) in the curve for 
p = 0, that the mixtures are characterized by higher local 
correlation in the ordered system than in the disordered 
ones. 
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Fig. 4. Distribution of the local correlation that characterizes 
the level of alignment with a stored pattern [Eq. I|13[l ] . System 
parameters: N = 2 x 10 3 , K — 100, 10 6 realizations per curve. 
A peak at C = 1, shared by the three curves with the higher 
values of p, is not shown for reasons of scale (see discussion in 
the text). 



3 Discussion 

We have studied a model of associative memory based 
on neural networks with a complex topology. This kind 
of connectivity can be considered as more similar to the 
biological networks than the completely connected or ran- 
domly diluted networks. Many of the general features of 
these systems are preserved: the network is able to retrieve 
a memorized pattern, up to a saturation. Besides, we have 
found a critical dependence of the efficacy of retrieval on 
the disorder parameter of the network: a collapse of the 
memory capability takes place at a finite value of the dis- 
order parameter. The optimal performance of the system 
occurs at an intermediate value of the disorder, just above 
the critical value. This enhanced performance occurs far 
away from the region of p = 1, which is equivalent to the 
well known models of completely connected or randomly 
connected neural networks. We have characterized the dif- 
ferent phases by the properties of the mixture states, that 
prevent the system to reach one of the memorized states. 

We have understood the failure of the more ordered 
networks to retrieve a stored pattern due to the partition 
of the system into arbitrary neighborhoods aligned with 
more than one pattern. This is something that the disor- 
dered networks cannot do, and in fact the distributions of 
the overlaps and of the correlations quantify this effect. 
It does not escape us that we cannot, at this stage, pro- 
vide an explanation of the enhanced performance of the 
intermediate region. 

We have checked the robustness of our results with re- 
spect to a small amount of noise in the dynamics. This 
has been implemented by nipping, with probability e, one 
neuron at random after each deterministic step. For val- 
ues of e up to 0.01, the results are indistinguishable from 
the noiseless system. For greater values of e the system 



becomes more and more ineffective to retrieve a pattern, 
but the general form of the curves <p(p) is preserved for 
the whole range of p. A systematic analysis of the problem 
of a truly noisy network, characterized by a temperature, 
remains to be done. 
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